7 research outputs found

    Optimasi Parameter K Pada Algoritma K-NN Untuk Klasifikasi Prioritas Bantuan Pembangunan Desa

    Get PDF
    Klasifikasi adalah proses menemukan model atau fungsi yang menggambarkan dan membedakan kelas atau konsep data. Algoritma k-NN (k Nearest Neighbors) merupakan algoritma klasifikasi berdasarkan pembelajaran dari data yang sudah terklasifiasi sebelumnya. Algoritma k-NN (k Nearest Neighbors) merupakan algoritma yang sangat bagus dalam menangani beberapa kasus, salah satu kelebihan k-NN diantaranya adalah tangguh terhadap data training yang noisy dan sangat efektif apabila data trainingnya besar. Namun terdapat beberapa masalah pada algoritma k-NN diantaranya adalah penentuan nilai k untuk pemilihan jumlah tetangga terdekatnya sangat sulit, karena nilai k sangat peka atau sensitif terhadap hasil klasifikasi. Pada penelitian ini, akan dilakukan pemodelan klasifiasi dengan menggunakan algoritma k-NN yang difokuskan pada proses penentuan nilai k terbaik pada dataset IKG (Indeks Kesulitas Geografis) desa. Pada penelitian ini akan melakukan integrasi algoritma k-NN dengan menentukan nilai k optimal dengan optimize parameters berdasar algoritma genetika

    Penentuan Centroid Awal Pada Algoritma K-Means Dengan Dynamic Artificial Chromosomes Genetic Algorithm Untuk Tuberculosis Dataset

    Get PDF
    Data merupakan hal penting diera sekarang begituĀ  juga dengan metode data mining yang dapat mengekstraksi data menghasilkan informasi. KlasteringĀ  1 dari 5 peran data mining yang berfungsi untuk mengelompokkan data berdasarkan tingkat kemiripan dan jarak minimum. Algoritma K-MeansĀ  termasuk algoritma yang populer banyak digunakan diberbagai bidang seperti bidang pendidikan, kesehatan, sosial, biologi, ilmu komputer. Seringkali metode K-Means dikombinasikan dengan metode optimasi seperti algoritma genetika untuk mengatasi permasalah pada K-Means yaitu sensitif dalam penentuan centroid awal .Namun metode algoritma genetika memiliki kekurangan yaitu mengalamai konvergen prematur sehingga hasil dari algorima genetika terjebak pada optimum lokal. Penelitian ini mengkombinasikan dynamic artificial cromosomes genetic algorithm dengan K-Means dalam menentukan nilai centroid awal pada k-means. Hasil eksperimen menunjukkan bahwa metode DAC GA + K-Means lebih unggul dibandingkan dengan K-Means dan GA + K-Means pada 2 dataset yang diuji dengan optimal nilai klaster sebanyak 2 dan 1 dataset sebanyak 3 klaster. Metode tersebut perolehan nilai DBI sebesar 0.138, 0.279 serta 0.382, nilai Sum Square Error sebesar 92.56, 332,39 dan 1280.68 serta nilai fitness yang tebentuk adalah 7.12, 3.57 dan 2.13

    Customer Segmentation with RFM Model using Fuzzy C-Means and Genetic Programming

    Get PDF
    One of the strategies a company uses to retain its customers is Customer Relationship Management (CRM). CRM manages interactions and supports business strategies to build mutually beneficial relationships between companies and customers. The utilization of information technology, such as data mining used to manage the data, is critical in order to be able to find out patterns made by customers when processing transactions. Clustering techniques are possible in data mining to find out the patterns generated from customer transaction data. Fuzzy C-Means (FCM) is one of the best-known and most widely used fuzzy grouping methods. The iteration process is carried out to determine which data is in the right cluster based on the objective function. The local minimum is the condition where the resulting value is not the lowest value from the solution set. This research aims to solve the minimum local problem in the FCM algorithm using Genetic Programming (GP), which is one of the evolution-based algorithms to produce better data clusters. The result of the research is to compare the application of fuzzy c-means (FCM) and genetic programming fuzzy c-means (GP-FCM) for customer segmentation applied to the Cahaya Estetika clinic dataset. The test results of the GP-FCM yielded an objective function of 20.3091, while for the FCM algorithm, it was 32.44741. Furthermore, evaluating cluster validity using Partition Coefficient (PC), Classification Entropy (CE), and Silhouette Index proves that the results of cluster quality from gp-fcm are more optimal than fcm. The results of this study indicate that the application of genetic programming in the fuzzy c-means algorithm produces more optimal cluster quality than the fuzzy c-means algorithm

    Comparison of Information Gain and Chi-Square Selection Features For Performance Improvement of Naive Bayes Algorithm On Determining Students With No PIP Recipients at SMKN 1 Brebes

    No full text
    All policies of the Smart Indonesia Program (PIP) through the form of the Smart Indonesia Card (KIP) are issued by the government under the auspices of the Ministry of Education and Culture (Kemendikbud) through the National Team for the Acceleration of Poverty Reduction (TNP2K). Helping to alleviate the poor category of students in order to obtain a proper education, prevent children dropping out of school, and fulfill their school needs are the goals of the program. This assistance can be used by students to meet all school needs such as transportation costs to go to school, the cost of buying school supplies, and school pocket money. This study aims to compare the Information Gain and Chi-Square selection features to improve the performance of the Naive Bayes algorithm in determining poor students who are recipients of the Smart Indonesia Program (PIP) at SMKN 1 Brebes, to determine the accuracy of the Naive Bayes, Information Gain and Chi-Square algorithms. and compare the level of accuracy and determine the attributes that affect the accuracy. At this stage, collecting relevant and useful research data, which is collected in the form of literature and data, and processed as research material. Sources of data used in this study in the form of primary data collection and secondary data. The primary data collection technique used in this study was a questionnaire or questionnaire, while the secondary data obtained in this study was through document files. At this stage, preliminary data processing is carried out, the data used is student data of SMKN 1 Brebes in 2021. The initial data collection obtained was 703 data, but not all records were used because they had to go through several stages of initial data processing (data preparation). The results of the Naive Bayes algorithm accuracy of 90.31% with an AUC of 0.967, after the addition of the Information Gain selection feature the accuracy becomes 90.88% with an AUC value of 0.970. The addition of the Information Gain selection feature can help improve the classification performance of the Naive Bayes algorithm even though the accuracy is not maximized. The accuracy of the Naive Bayes algorithm is 90.31% with an AUC of 0.967, after the addition of the Chi-Square selection feature the accuracy becomes 90.88% with an AUC value of 0.970. The accuracy results are not maximized but the addition of the Chi-Square selection feature can also improve the classification performance of the Naive Bayes algorithm. The accuracy of the Naive Bayes algorithm is 90.31% with an AUC of 0.967, after the addition of the Information Gain selection feature and the Chi-Square selection feature the accuracy becomes 90.88% with an AUC value of 0.970. The results of the same accuracy in the use of the Information Gain and Chi-Square selection features to increase the performance of the Naive Bayes algorithm by 0.57% although the accuracy results are still less than optimal

    CLUSTERING TRAFO DISTRIBUSI MENGGUNAKAN ALGORITMA SELF-ORGANIZING MAP

    Get PDF
    Salah satu cara untuk mengetahui beban sebuah trafo distribusi PLN masih memenuhi batas normal atau overload adalah dengan melakukan pengukuran beban trafo tersebut. Pada PLN Area Pelayanan Jaringan Kudus, pengukuran beban dilakukan baik pada siang hari mau pun pada malam hari. Hasil pengukuran tersebut memiliki kemungkinan berbeda. Hal ini disebabkan pada siang hari penggunaan beban cenderung kecil, sedangkan pada malam hari pemakaian beban lebih besar. Hal ini menyebabkan sulitnya menentukan beban trafo tersebut masih normal atau overload. Untuk memetakan beban trafo distribusi secara cepat dan akurat, diperlukan teknik data mining yaitu clustering. Penelitian ini dilakukan dengan menerapkan algoritma Self Organizing Map (SOM). Dengan SOM dihasilkan nilai akurasi sebesar 93% terhadap hasil pengukuran beban trafo distribusi pada siang hari dan sebesar 84% terhadap hasil pengukuran beban trafo distribusi pada malam hari. Sedangkan error yang dihasilkan dari pemetaan dengan SOM sebesar 7% terhadap hasil pengukuran beban trafo distribusi pada siang hari dan sebesar 16% terhadap hasil pengukuran beban trafo distribusi pada malam hari

    BPNN Optimization With Genetic Algorithm For Classification of Tobacco Leaves With GLCM Extraction Features

    No full text
    Tobacco leaves are one of the agricultural commodities cultivated by Indonesian farmers. In their application in the field, there are many obstacles in tobacco leaf cultivation, one of which is declining tobacco quality caused by weather factors. In this study, a technology-based analysis step was carried out to determine the classification in determining the quality of tobacco leaves. The research was carried out by applying the classification optimization of the Backpropagation Artificial Neural Network Method and genetic algorithms to determine the weights obtained from extracting GLCM features. You can get the weight value from the genetic algorithm on the homogeneity variable from this analysis step. The variable gets a weight value of 1. The results of this study obtained a classification value with the Backpropagation Artificial Neural Network Method model getting an accuracy value of 53.50% at a hidden layer value of 2,4,5,7. For classification with the Artificial Neural Network Method, Backpropagation, which is optimized with genetic algorithms, you get an accuracy value of 64.50% at the 4th hidden layer value. From this study, the value of optimization accuracy increased by 11% after being optimized with genetic algorithms.  Tobacco leaves are one of the agricultural commodities cultivated by Indonesian farmers. In their application in the field, there are many obstacles in tobacco leaf cultivation, one of which is declining tobacco quality caused by weather factors. In this study, a technology-based analysis step was carried out to determine the classification in determining the quality of tobacco leaves. The research was carried out by applying the classification optimization of the Backpropagation Artificial Neural Network Method and genetic algorithms to determine the weights obtained from extracting GLCM features. You can get the weight value from the genetic algorithm on the homogeneity variable from this analysis step. The variable gets a weight value of 1. The results of this study obtained a classification value with the Backpropagation Artificial Neural Network Method model getting an accuracy value of 53.50% at a hidden layer value of 2,4,5,7. For classification with the Artificial Neural Network Method, Backpropagation, which is optimized with genetic algorithms, you get an accuracy value of 64.50% at the 4th hidden layer value. From this study, the value of optimization accuracy increased by 11% after being optimized with genetic algorithms
    corecore